An efficient watermarking algorithm for digital audio data in security applications

Transform-domain audio watermarking systems are more robust than time-domain systems. However, the main weakness of these systems is their high computational cost, especially for long-duration audio signals. Therefore, they are not desirable for real-time security applications where speed is a critical factor. In this paper, we propose a fast watermarking system for audio signals operating in the hybrid transform domain formed by the fractional Charlier transform (FrCT) and the dual-tree complex wavelet transform (DTCWT). The central idea of the proposed algorithm is to parallelize the intensive and repetitive steps in the audio watermarking system and then implement them simultaneously on the available physical cores on an embedded systems cluster. In order to have a low power consumption and a low-cost cluster with a large number of physical cores, four Raspberry Pis 4B are used where the communication between them is ensured using the Message Passing Interface (MPI). The adopted Raspberry Pi cluster is also characterized by its portability and mobility, which are required in watermarking-based smart city applications. In addition to its resistance to any possible manipulation (intentional or unintentional), high payload capacity, and high imperceptibility, the proposed parallel system presents a temporal improvement of about 70%, 80%, and 90% using 4, 8, and 16 physical cores of the adopted cluster, respectively.

The main limitations of the existing audio watermarking systems are low robustness, in particular to shifting modification, and high computation cost, especially for long-duration audio.
Several methods address the first limitation that can be overcome by using a synchronization code strategy [12][13][14][15][16][17][18] .With this strategy, synchronization codes are also embedded together with a watermark into the host audio signal to determine the positions of the modified samples of the audio signal.In the watermark extraction process, these synchronization codes are firstly found, and then the watermark bits that follow the synchronization code can be extracted.Without using a synchronization code strategy, we proposed in 19 a hybrid approach robust to attacks, including shifting attacks, based on the Dual Tree Complex Wavelet Transform (DTCWT) and the Fractional Charlier Transform (FrCT).We embedded the watermark in the host signal by manipulating the coefficients resulting from the application of the DTCWT and FrCT, respectively.
If the robustness problem against shifting attacks has been effectively addressed by the methods mentioned earlier, the execution time of audio watermarking systems in real-time applications remains a challenging problem.In the context of copyright protection applications, the duration of the watermark embedding process may not be a primary concern, but the need for swift watermark extraction is of paramount importance 20 .This emphasis on rapid extraction is supported by a multitude of compelling reasons 21 .Firstly, in scenarios characterized by real-time content dissemination, such as live streaming or content delivery networks, the rapid extraction of watermarks becomes indispensable for immediate verification of authenticity and copyright ownership.Fast extraction is vital for detecting and addressing unauthorized usage or distribution promptly.Secondly, content creators and copyright holders frequently employ automated systems to monitor the utilization of their intellectual property across digital platforms.The efficient and timely tracking of copyrighted material depends on a swift watermark extraction process, facilitating the effective implementation of enforcement measures.Thirdly, the user experience is significantly affected by the pace of watermark extraction, particularly in applications like video streaming or online gaming.The imperative here is to minimize disruptions and latency issues, ensuring seamless content consumption.Fourthly, scalability considerations loom large as the volume of multimedia content burgeons across the internet.A rapid watermark extraction capability is pivotal for the efficient management and safeguarding of extensive content repositories.Fifthly, the expeditious extraction of watermarks plays a crucial role in deterring piracy and curtailing unauthorized distribution of copyrighted content.It strengthens the ability to promptly identify infringements and take necessary legal actions, thereby effectively safeguarding intellectual property rights.These reasons underscore the significance of fast watermark extraction in the context of audio watermarking systems used for copyright protection.However, most audio watermarking systems in the transform domain, such as 3,[9][10][11]19 , are very time-consuming, especially for signals of long duration. Thse systems operate in a sequential manner (Fig. 1).They divide the audio signal into segments and then apply a set of steps to each segment (preprocessing, switching from the time domain to the transform domain, embedding watermark bits, reconstructing watermarked segments, etc.).Only one segment is processed at a time on a single processor core.Applying transforms, and inverse transforms in a sequential way on audio segments are intensive processes, mainly for audio signals of long duration and for hybrid approaches that combine multiple transforms.
The main goal of this paper is to create and implement a fast audio watermarking system in the transform domain that can be executed in real-time.The central idea of the possible solution is to parallelize the intensive  and repetitive steps in the audio watermarking system and then execute them simultaneously on the available physical cores of a multi-core processor (Fig. 2).
In the realm of parallel computing, various endeavors have been made, particularly in the domain of image processing applications.For instance, Hosny et al. 22 presented a pioneering parallel medical image watermarking scheme, which they successfully deployed on both multi-core CPUs and GPUs.Similarly, Daoui et al. 23 introduced a parallel image encryption algorithm tailored for multi-core CPU architectures.Additionally, researchers in a related study 24 harnessed the parallel processing capabilities of both multi-core CPUs and GPUs to enhance image reconstruction and image classification tasks.
Despite the documented strides made in leveraging parallel computing for computational acceleration in various domains, these efforts have predominantly been confined to conventional personal computing devices (desktops or laptops).Such devices, distinguished by their considerable physical dimensions and weight, possess limited portability.Consequently, their applicability in mobility-constrained environments, encompassing scenarios like transportation modes (e.g., cars, trains, planes, and boats) and smart home or urban infrastructure contexts, has remained largely impractical.
In response to these inherent limitations associated with traditional personal computing systems, the adoption of mobile and portable embedded systems, exemplified by platforms such as Raspberry Pis, has emerged as a viable solution 25 .These embedded systems offer a compelling alternative by virtue of their compact form factor, lower power consumption, and enhanced mobility, making them well-suited for a diverse range of applications and settings.
Parallel processing entails a heightened demand for computational resources, encompassing processor cores and memory, due to the concurrent execution of tasks and the need for efficient workload distribution.Simultaneous execution of multiple tasks necessitates the allocation of dedicated processor cores, while data sharing and synchronization among these tasks amplify the requirement for memory resources.To tackle this computational challenge, we build in this paper a cluster based on several Raspberry Pis for fast, parallel, and distributed audio watermarking.The selection of the Raspberry Pi as our computational platform is substantiated by its advantageous features, including its exceptional portability due to its lightweight (46 g) and compact dimensions (85.6 mm × 56.5 mm), coupled with its minimal power consumption and affordability.Compared to other versions of the Raspberry Pi, the 4B version with 2 GB of RAM is powerful enough to support complex signal processing applications that require a high computational load.In this paper, a cluster based on four Raspberry Pis 4B is built to have a large number of physical cores, which is very useful to accelerate the time of a parallel watermarking system.This paper presents a parallel watermarking system for audio signals, implemented on the Raspberry Pi cluster.The proposed approach decomposes the host audio signal and the watermark into several sub-signals and vectors equal to the number of available cores of the Raspberry Pi cluster.Then simultaneously, on each core, we extract from each sub-signal the low-frequency coefficients that are less sensitive to the human auditory system by applying the 5-level DTCWT.Then, we apply the FrCT transform 26 with the optimal fractional order in order to improve the imperceptibility and robustness, and then we embed the watermark bits by quantizing the energies of FrCT coefficients.Finally, each core of the Raspberry Pi cluster sends the watermarked sub-signal to the master Raspberry Pi, and then the latter combines all these sub-signals to obtain the watermarked audio signal.Each Raspberry Pi 4B in our cluster has the same input data and the same copy of the instruction script, but each Raspberry Pi executes only a specific part of the script determined by the master Raspberry Pi of the cluster.Raspberry Pis in the cluster are independent of each other, and communications (sending and receiving data) between them are ensured using the Message Passing Interface (MPI) library 27 .
Like the embedding process, the watermark extraction process requires neither the original audio signal nor the original watermark (blind extraction).We also used a modified Henon map 28 to encrypt the watermark and guarantee security.
The results show that the proposed parallel watermarking system is fast compared to the sequential system, with an improvement of about 70%, 80%, and 90% using 4, 8, and 16 cores of the Raspberry Pi cluster, respectively.
As summary, the contributions of this article are presented as follows.
• A new parallel audio watermarking system implemented on the embedded systems cluster is proposed for the first time.• The audio watermarking system is fast and can be desirable for real-time applications.
• All the Raspberry Pis in the cluster work simultaneously on the audio watermarking system, which reduces the execution time.• Raspberry Pi is characterized by its easy portability due to its lightweight and small size, and therefore, the limited portability of standard PCs can be overcome.
The rest of the manuscript is organized as follows: Sections "Discrete fractional Charlier transform", "Dualtree complex wavelet transform", and "Modified Henon map" present respectively the FrCT, the DTCWT, the modified Henon map, and their roles in the proposed approach.Section "Raspberry Pi cluster" presents our Raspberry Pi cluster.Section "Proposed parallel audio watermarking system" presents the proposed parallel audio watermarking system.Section "Experiments results" presents the experimental results and discussions, and the conclusion is finally provided in Section "Conclusion".

Discrete fractional Charlier transform
In our previous paper 26 , we proposed the fractional version of the Charlier transform, which is called the fractional Charlier transform (FrCT) based on the fractional Charlier polynomials (FrCPs) also proposed in the same paper.The FrCT generalizes the classical Charlier transform of integer order to fractional order in order to benefit the properties of non-integer orders.
The main property of FrCT that makes it very suitable for digital watermarking is its dependence on transform orders.By adjusting the fractional orders in the FrCT transform, different FrCT coefficients can be obtained.Therefore, we select the optimal fractional orders, and the corresponding FrCT coefficients are used as host coefficients to integrate the watermark.This approach improves the imperceptibility and robustness requirements of the watermarking system.In addition, the fractional orders in the transform can be used as additional secret keys to improve the security of the watermarking system.
Let x(t), t = 1, 2, ..., N be a one-dimensional signal of finite length N , the one-dimensional fractional Charlier transform of this signal with fractional order α, (α ∈ R) is defined as follows: where x is a column vector representation of x(t) , and C α is the fractional Charlier polynomial matrix of size N × N and fractional order α , which is defined as follows: where the eigenvectors of the fractional Charlier polynomial matrix v k (k = 0, 1, ..., N − 1) are the kth column of V , and D α is defined as follows: The corresponding inverse transform (iFrCT) can be written as follows:

Dual-tree complex wavelet transform
The DTCWT 29 is an enhanced expansive version of the DWT.It is implemented as two separate DWTs ( Tree a and Tree b ) applied on the same signal data (Fig. 3).At the heart of DTCWT is a pair of filters: low pass and high pass.For a DTCWT of level H , the low-pass ( h 0 ) and high-pass ( h 1 ) filters of Tree a generating the approximation The outputs of the DTCWT can be interpreted as complex coefficients as follows: where A H are the approximation coefficients of level H and D H are the detail coefficients of level H.
The original signal can be reconstructed without loss of information using inverse DTCWT (iDTCWT) 29 .
The main advantage of DTCWT for signal processing is the shifting invariance that is not ensured by DWT.Indeed, the DTCWT is approximately shifting invariant, which means that small shifts in the input signal do not produce major variations in the energy distribution of the DTCWT coefficients at different levels.To obtain this advantage, the approximation and the detail coefficients of Tree a must be approximate Hilbert transforms of the approximation and the detail coefficients of Tree b , that is (1) where H is the Hilbert transform operator.
In our case, for the first level, we use a set of filters from 30 , and for the other levels, we use a set of filters from 31,32 in order to verify the condition of Eq. ( 6).

Modified Henon map
The modified Henon map is a nonlinear chaotic map very sensitive to the initial conditions recently proposed in 28 .This chaotic map is defined as follows: In this paper, the modified Henon map is used to encrypt the watermark information before embedding it into the original host audio signal.This makes the watermark hard to be extract by unauthorized persons, which improves the overall security of the audio watermarking system.In addition, the encryption of the watermark eliminates the correlation between its information, and consequently, an improvement can be achieved in terms of the overall robustness of the proposed watermarking system.
Let W = {w(i), 0 ≤ i < N} be a binary sequence of ones and zeros with N bits, the watermark encryption process is as follows: (1) Generate a chaotic sequence Y = {y(i), 0 ≤ i < N} using the modified Henon map (Eq.7).
(2) Binarize the sequence Y using its mean T as a binarization threshold as follow: (3) Encrypt the watermark from W to W 1 by applying the xor operation between W and Y as follows: The watermark can be decrypted by applying the xor operation between the encrypted watermark W 1 and the chaotic sequence Y as follows: Watermark decryption depends on the initial parameters of the modified Henon map a, b, c, d, x 0 , y 0 .These parameters can be used as a secret key in an audio watermarking system.

Raspberry Pi cluster
Basically, a cluster can be considered as a group of computers in a single entity.By combining two or more computers in a cluster, one can achieve a potential increase in performance by performing operations in a distributed and parallel environment.In this paper, we build a cluster using Raspberry Pi embedded systems for fast, parallel, and distributed audio watermarking.This choice can be justified by the fact that the Raspberry Pi is characterized by its easy portability due to its light weight (46 g) and small size (85.6 mm × 56.5 mm), low power consumption, low cost, and in terms of its functionality and scalability.Raspberry Pi has been used in various domains such as Internet of Things (IoT) [34][35][36] , image processing [37][38][39][40] , home automation 41,42 , and other applications.
Several versions of the Raspberry Pi computer have been produced by the Raspberry Pi Foundation 43 with an open-source platform.Compared to the previous versions of the Raspberry Pi (3B, 3B + , 2B, 2B + , 1A, and 1B), Raspberry Pi 4B (Fig. 4) presents a major improvement in terms of processor speed and RAM quantity.The characteristics of the Raspberry Pi 4B are summarized in Table 1.As can be seen in Table 1, the Raspberry Pi 4B is powerful enough to support complex signal processing applications that require a high computational load.In addition, the Raspberry Pi 4B's processor has four physical cores, so it can be very useful when applications implemented on this processor can be run on more than one core.In this paper, a cluster based on four Raspberry Pis 4B is built to have a large number of processor cores, which is very useful to accelerate the time of an audio watermarking system.
Figure 5 shows the architecture of our Raspberry Pi cluster: we have the main node (Master) that controls all operations and three computing nodes (Node1, Node2, Node3) to increase overall performance.Each node is ( 7) ;   In order to ensure communications between the four Raspberry Pis, we mainly need the MPICH tool.MPICH with Python wrapper ( MPI4PY) is an open-source implementation of the MPI standard (Message Passing Interface) 27 , whose purpose is to manage parallel computer architectures.MPI allows the main Raspberry Pi (master node) to distribute, in a parallel manner, the computational task among all the other Raspberry Pis in the cluster.
After installing the same Raspbian OS and the same applications and libraries on all the Raspberry Pis, we configure their hostnames, and then we get their IP addresses.Finally, we authorize the master Raspberry Pi to connect to the other Raspberry Pis via SSH (Secure Shell) without a password.
Figure 6 shows the execution result of a simple Python script sent by the master Raspberry Pi to the other Raspberry Pis using MPI.Each of the 16 processors on the network had to report to the master to confirm that all processors were working properly.
It is essential to highlight that the master node may occasionally undergo an automatic restart when handling computationally-intensive tasks, especially if the power source is inadequate.To mitigate this issue, we employed power sources capable of delivering a stable current range of 2.0 to 2.5 amperes to each Raspberry Pi in our cluster.
The choice to opt for cluster computing in this paper over other computing techniques, such as cloud computing, is a strategic choice rooted in several key considerations.Firstly, data privacy and security are paramount in our audio watermarking system.Cluster computing allows us to maintain full control over our data, keeping it within our network.This level of control mitigates potential risks associated with relying on cloud-based storage and processing, where data may be exposed to external vulnerabilities.Furthermore, the nature of audio watermarking demands low-latency communication to ensure real-time processing.Cluster computing excels in this regard as it involves physically proximate nodes, reducing communication latency significantly compared to the internet-based data transfer typical of cloud computing.This low-latency advantage is critical for the timely execution of audio watermarking tasks.Additionally, audio watermarking is a computationally intensive process that requires tailored hardware and software configurations for optimal performance.Cluster computing offers us the flexibility to fine-tune these configurations to specifically meet the demands of our task.In contrast, cloud computing often involves shared resources, making it less customizable and potentially less efficient for our resource-intensive processing needs.Lastly, in terms of long-term cost-efficiency, cluster computing emerges as the preferred choice.Unlike cloud computing, which often incurs recurring service fees, cluster computing allows us to leverage our existing hardware investments without incurring ongoing expenses.This cost-saving aspect aligns well with our project's budgetary constraints.

Proposed parallel audio watermarking system
In order to accelerate the execution time, we propose a parallel audio watermarking system that can be implemented on the Raspberry Pi cluster.The proposed approach (Fig. 7) decomposes the host audio signal and the watermark into several sub-signals and vectors equal to the number of available cores of the Raspberry Pi cluster.Then, simultaneously on each core, we extract from each sub-signal the low-frequency coefficients by applying the 5-level DTCWT.Then, we apply the FrCT transform with the optimal fractional order in order to improve the imperceptibility and robustness, and then we embed the watermark bits by quantizing the energies of the first coefficients.Finally, each core of the Raspberry Pi cluster sends the watermarked sub-signal to the master node, and then the latter combines all these sub-signals to obtain the watermarked audio signal.
The watermark extraction process in the proposed system neither needs the original audio signal nor the original watermark (the extraction is blind).The watermarked audio signal is decomposed into sub-signals, and each sub-signal is sent to a single core of the Raspberry Pi cluster.Then, each sub-signal is subjected again to DTCWT and FrCT transforms before extracting the watermark bits.Finally, each core in the Raspberry Pi cluster sends the watermark bits to the master node, and then the latter node combines these bits to recover the watermark.Each Raspberry Pi in our cluster has the same input data and the same instruction script, but each Raspberry Pi executes only a specific part of the script determined by the master Raspberry Pi of the cluster.All Raspberry Pis in the cluster are independent of each other, and communications (sending and receiving data) between the master Raspberry Pi and the other Raspberry Pi are ensured using the MPI library.
The following sections detail the embedding and extraction processes.

Embedding process
Let S = {s(i), 0 ≤ i < L} denote a host audio signal with L samples, and W = {w(i) ∈ {0, 1}, 0 ≤ i < N} is a binary sequence of ones and zeros with N bits to be embedded within the host audio signal.The watermark embedding process can be summarized as follows.
Step 1: Encrypt the watermark from W to W 1 using the modified Henon map-based encryption procedure (Section "Modified Henon map") where the encrypted watermark is defined as follows: The initial parameters of the modified Henon map a, b, c, d, x 0 , y 0 labelled as KEY are used as a secret key in our audio watermarking system.
Step 2: Divide W 1 into M-equal-length vectors V k , where Step 3: Divide the audio signal S into M-equal-length sub-signals S k , where where M represents the number of available cores in the Raspberry Pi cluster.Each core of the Raspberry Pi cluster (core k , k = 0, 1, ..., M − 1) receives the watermark vector V k and sub- signal S k and then executes the steps (4)(5)(6)(7)(8)(9)(10)(11).
Step 5: Generate A 5 and D 5 , D 4 , D 3 , D 2 , D 1 by applying 5-level DTCWT, where A 5 = A 5 a + jA 5 b are the approximation coefficients of level 5 and D i = D i a + jD i b , i = 1, 2, 3, 4, 5 are the detail coefficients of level i.
Step 6: Apply FrCT on A 5 produces vector named FrCM α (12) where FrCM α and A 5 are 1 × n vectors, n = L N×2 5 , and C α is the n × n matrix FrCPs which can be calculated from Eq. ( 2).In this paper, the fractional order in FrCT is set to α = 0.2 as recommended in 19 .
Step 7: Calculate the energy of FrCM α produced value named E.
Step 8: Embed the watermark bit in the vector FrCM α by quantizing its energy E , in the following way where FrCM α is the original FrCT vector, FrCM α is watermarked FrCT vector and where Δ is the quantization step and floor(.) is the floor operator.
Step 9: Apply iFrCT on the watermarked vector FrCM α and obtain the watermarked approximation coef- ficients A 5 Step 10: Get watermarked frame F j by applying iDTCWT on A 5 and D 5 , D 4 , D 3 , D 2 , D 1 .
Step 11: Reconstruct the watermarked sub-signal S k with watermarked frames: Step 12: Each core of the cluster (core k , k = 0, 1, ..., M − 1) sends the watermarked sub-signal S k to the core 0 , and then the latter combines all these sub-signals to obtain the watermarked audio signal S as follows:

Extraction process
Let S = {s(i), 0 ≤ i < L} denote a watermarked audio signal with L samples, the extraction of the watermark from S is blind, and it can be summarized as follows: Step 1: Divide the watermarked audio signal S into M-equal-length sub-signals S k , where M represents the number of Raspberry Pi cluster cores.
Each core of the Raspberry Pi cluster (core k , k = 0, 1, ..., M − 1) receives the sub-signal S k and then executes the following steps.
Step 2: Decomposed the sub-signal S k into J-equal-length frames, where J = N/M.
Step 4: Each core in the Raspberry Pi cluster ( core k , where k = 0, 1..., M − 1 ) sends its vector, V * k , to core 0 , which then combines these vectors to obtain the encrypted watermark sequence as follows: Step 5: W * 1 is decrypted using the same initial parameters ( KEY ) of the modified Henon map to recover the watermark W * .

Experiments results
The performance of the proposed parallel audio watermarking system is demonstrated using the Python programming language.The proposed system is implemented on the Raspberry Pi cluster presented in Section "Raspberry Pi cluster", which is composed of four Raspberry Pi 4Bs, each equipped with 2 GB of RAM and a 16 GB SD card for local storage.Each Raspberry Pi 4B will have the same input data (audio signal and watermark) and the same copy of the instructions script, but each node only runs a specific part of the script determined by the master Raspberry Pi of the cluster.
Five audio signals of different types and lengths from (https:// www.loope rman.com/ loops) were used for the experiments as test audio signals (Table 2), and a binary sequence of ones and zeros was used as a watermark.The length of the watermark depends on the duration of the host audio signal.A single bit of the watermark is embedded in the host signal every 486 samples, covering the whole host signal.
The performance of the proposed audio watermarking system is compared with that of six notable audio watermarking systems, each chosen for specific reasons.These systems, namely FrCT-DTCWT 19 , DWT-DTMT 10 , DWT-DCT 9 , DCT-SVD 17 , DWT 3 , DCT 5,16,18 , and SVD 7 , were selected based on considerations such as the ( 15)

Payload capacity
The payload capacity determines the quantity of information that can be inserted into the host signal while maintaining imperceptibility.Let B be the number of bits embedded into an audio signal of duration d in seconds.Payload capacity is defined as follows: The payload capacity P is measured in the unit of bps (bits per second).According to the International Federation of the Phonographic Industry (IFPI) 44 , the payload capacity must be at least 20 bps for any audio watermarking system.Therefore, the payload capacity of the proposed system, shown in Table 3, is too high and very sufficient and verifies the IFPI condition, which is set at 20 bps.
From the comparison results in Table 4, we can see that the proposed system can provide a high average payload (91.1926 bps), which is much higher than the 20 bps recommended by IFPI.The average payload of our system is higher than that of 9,[16][17][18] , but it is lower than that of other selected systems in this comparison.This can be justified by the fact that the payload of the proposed system was set to a sufficient and acceptable value in order to have superiority in terms of imperceptibility.Indeed, imperceptibility is the main requirement of any audio watermarking system; if the watermarked audio signal is not of good quality, it will not be accepted either by the industry or by the users.

Imperceptibility
For measuring the imperceptibility of the watermarked audio signals, the signal-to-noise ratio (SNR) 9 is adopted to evaluate the quality of the watermarked audio signal by measuring the objective similarity between the original host signal S = {s(i), 0 ≤ i < L} and the watermarked one S = {s(i), 0 ≤ i < L} .A larger value of SNR indicates that the watermarked audio signal closely resembles to the original audio signal, which means that the watermark is more imperceptible.The SNR is defined and calculated as follows: According to the IFPI 44 , the SNR must be at least greater than 20 dB to have an imperceptible watermarked audio signal.
In our system, we embedded the watermark bits by quantizing the energies of FrCT coefficients.In general, in quantization-based audio watermarking systems, imperceptibility and robustness are influenced by the value of the quantization step Δ.A larger quantization step will result in a lower quality of the watermarked audio, while a smaller quantization step will influence the robustness of the watermark.In order to obtain the appropriate value of Δ, experiments were performed for different host audio signals.The binary watermark is embedded in the host audio signals with different quantization steps Δ.For each quantization step Δ, the SNR values are calculated and then plotted against Δ in Fig. 8.As expected, the SNRs decrease with increasing Δ.This is because the energies of the FrCT coefficients (where the watermark bits are embedded) are far from their original values, and thus there are distortions in the original audio signals.This figure also shows that the step Δ = 0.2 gives an SNR greater than 30 dB for different signals, which largely ensures the IFPI recommendation.Thus, this step value will be used in the following experiments.
Figure 9 shows the original audio signals and the watermarked versions using Δ = 0.2, and the corresponding SNR values are listed in Table 5.These results clearly show that the proposed system satisfies the requirements of the IFPI with an SNR greater than 20 dB for different audio signals, and it can be increased up to 33.5 dB depending on the type of the host signal.
The comparison results presented in Table 6 clearly show that the proposed system can achieve high imperceptibility (32.21 dB), which is much higher than the 20 dB recommended by IFPI.The average imperceptibility of our system is higher than that of most other systems selected for comparison.Note that the average imperceptibility of our system is lower than that of 5 because we chose a relatively large quantization step in order to have good robustness.The advantage of this choice will be clearly demonstrated in the next section.

Robustness against common signal processing manipulations
Watermarked audio signals can be frequently subjected to common signal processing manipulations.These manipulations can modify the frequency content and dynamics of the host audio signal and, as a result, deform the embedded watermark.In addition, third parties may attempt to modify the watermarked audio signal to prevent extraction of the embedded watermark.
To evaluate the robustness of the watermark against different common signal processing manipulations, the Bit Error Rate (BER) 12 is used as an objective criterion in this paper.Mathematically, BER is defined as ( 24) SNR(S, S) = 10 log www.nature.com/scientificreports/BER measures the similarity between the original watermark and the extracted one.BER is a number in the range [0, 1].If BER is equal to 0, then the extracted watermark is exactly the same as the original one.If it is equal to 1, then the extracted watermark is very different from the original one, i.e., the extraction process has failed.
The robustness of the proposed watermarking system is evaluated against common signal processing manipulations and attacks.The robustness results of the proposed system are as follows:

Robustness without signal processing manipulations
The present test is performed to verify the effectiveness of the proposed system in recovering the watermark from watermarked audio signals without any applied manipulation.The extraction results for different audio signals (Table 2) are presented in Table 7.This table shows that the BER values are zeros for all audio signals, which clearly indicates the robustness of the proposed system in the absence of any possible manipulation.However, there are still many tests to be performed to validate the robustness of our system against common signal processing manipulations and attacks.

Robustness to AWGN
When transmitting watermarked audio signals to a radio station via a communication channel, these signals may be affected by noise.Therefore, it is necessary to test the robustness of the proposed system in noisy environments.In this context, the Additive white Gaussian noise (AWGN) is applied to the watermarked audio signals with SNR equal to 30 dB, 20 dB, and 18 dB.Then, the extraction process is applied to recover the embedded watermark from the noisy watermarked signals.The extraction results for the five audio signals are presented in Table 8.These results indicate that the proposed method is able to extract the watermark perfectly even with AWGN addition, and the BER values remain zero for different watermarked signals.Therefore, the proposed method is effectively resistant to noise addition.

Robustness to resampling and requantizing
Resampling and requantizing are common signal processing manipulations that change the format of the watermarked signals.During the experiment, the watermarked signals were firstly down-sampled to 8000 Hz, 11,025 Hz, and 22,050 Hz, and then up-sampled back to 44,100 Hz.Secondly, the watermarked signals were quantized to 24 bits/sample, 8 bits/sample, and then back to 16bits/sample.The extraction results of these  9.We can observe from this table that the BERs are zeros for different watermarked signals, which proves that the proposed method can effectively resist resampling and requantizing.

Robustness to signal filtering
Filters are often used in signal processing to cut or remove certain sub-bands of the audio spectrum.For this, we evaluate the robustness of the proposed system against signal filtering.The watermarked audio signals are filtered by low-pass filtering with a cutoff frequency of 4 kHz and 500 Hz, respectively, and by high-pass filtering with a cutoff frequency of 200 Hz.The results of the filtering manipulations (Table 10) show that the BER values are lower than 1.60% for different filtered watermarked signals, which indicates that the proposed algorithm has strong robustness towards the filtering manipulations.

Robustness to echo addition
In this experiment, the robustness of echo addition is tested.We added to the watermarked signals an echo signal with a delay of 50 ms and a decay of 5% and an echo signal with a delay of 300 ms and a decay of 40%.Table 11 presents the BERs after adding these modifications.The extraction results show that the BERs are zeros for all watermarked signals, which indicates that the proposed algorithm has strong robustness against echo addition.

Robustness to MP3 compression
Signal compression is often applied to audio signals during processing to reduce the size of audio files.We test, in Table 12, the robustness of the proposed system when the watermarked signal format is changed from WAVE to MP3 and back to WAVE by applying MPEG-1 Layer 3 compression with 128 kbps, 112 kbps, 64 kbps, and 32 kbps.As seen from this table, the proposed system still has very low BERs when MP3 (32 kbps) is applied, which are less than 11.40%.That means that the proposed system provides good performance under MP3 compression manipulations.The robustness of the proposed system is also tested against amplitude scaling manipulation.We scaled the amplitude of the watermarked audio signals with factors of 1.2, 1.1, 0.9, and 0.8, and then the extraction process was applied to recover the embedded watermark.Table 13 presents the extraction results in terms of BER.These results show that the BERs are zeros, which indicates that the proposed system has strong robustness against amplitude scaling manipulation.

Robustness to cropping
In this experiment, the robustness against cropping manipulation is tested.Cropping is a manipulation frequently applied by third parties to modify watermarked signals to distort the embedded watermark.In this test, 10%, 20%, 30%, and 40% samples of the watermarked signals are randomly replaced by zeros.The results for the watermarked signal of classical, rap, jazz, pop, and rock are given in Table 14, indicating that the proposed system has strong robustness against cropping manipulation where the BER does not exceed 1.2% for random cropping (40%).

Robustness to shifting
Shifting is another very sophisticated manipulation that can be used to distort the embedded watermark by shifting the watermarked audio signal by a specified number of samples to the right or to the left.In this test, the performance of the proposed system is tested under image translation signal shifting: the watermarked audio signal is shifted to the right by 5, 10, 20, 50, 100, and 150 samples, and then the extraction process is applied to recover the embedded watermark.Table 15 shows that the proposed system achieves superb robustness against shifting manipulation when 5, 10, 20, and 50 samples are shifted and acceptable robustness when 100 and 150 samples are shifted with BERs less than 0.152 (15.2%), which is expected because the DTCWT transform adopted by the proposed system ensures shifting invariance.
Robustness to TSM Time Scale Modification (TSM) is a digital signal processing technique employed to either accelerate or decelerate the playback speed of an audio signal without altering its pitch.TSM can be utilized for various purposes, such as adjusting the duration of music recordings to ensure synchronous playback or synchronizing an audio signal with a given video clip.In this test, we evaluate the performance of the proposed system when subjected to TSM.TSM is applied to watermarked audio with varying degrees of modification ranging from − 5 to + 5%.Subsequently, the extraction process is executed to recover the embedded watermark.The results, presented in Table 16, showcase the extraction performance in terms of BER.Notably, these results indicate that the proposed system maintains consistently low BER values, all of which are less than 11%, even when subjected to TSM with The robustness of the proposed audio watermarking system is evaluated through a comparative analysis with nine state-of-the-art audio watermarking systems.The results of this comparison are summarized in Table 17.7,9,10,[16][17][18][19] , overall, and is only slightly less effective than 3 in terms of MP3 compression resistance, as well as 17 in terms of TSM resistance.Our method, along with methods 16,17,19 , stands out for its resistance to shift attacks.This resistance can be attributed to the use of the DTCWT transform in our method and in method 19 , providing an approximate shift invariance.Methods 16, 17 also achieve significant resistance through a synchronization mechanism.

Time complexity analysis
Transform domain watermarking systems are more robust than those implemented in the time domain.However, the major disadvantage of transform-domain watermarking systems is that they are time-consuming, especially for audio signals of high duration.Table 18 shows the elapsed execution time of the proposed audio watermarking system implemented on the Raspberry Pi using the sequential approach.This test was performed using different audio signals (Classical, Pop, Jazz, Rap, Rock) with durations ranging from 60 to 180 s.As shown in this table, the execution time of the embedding and extraction processes is very high, and it increases with increasing signal duration.Figure 10 shows the time required for different steps of the proposed system using the "Classical" audio signal of 60 s.As shown in this figure, the most computationally intensive steps are the computation of the transforms FrCT and DTCWT and their inverse transforms iFrCT and iDTCWT.This leads to slow embedding and extraction processes.
As highlighted in Section "Proposed parallel audio watermarking system", both the embedding and extraction processes can be parallelized using the MPI library on the Raspberry Pi cluster to accelerate the execution time of the proposed audio watermarking system.Each Raspberry Pi 4B in our cluster (Section "Raspberry Pi cluster") has the same input data and the same copy of the script, but each node executes only a specific part of the script determined by the master Raspberry Pi of the cluster.All raspberry pi's in the cluster are independent of each other, and communications (sending and receiving data) between the master node and the other nodes are ensured using MPI. Figure 11 presents the execution times required by the proposed parallel audio watermarking system implemented on our Raspberry Pi cluster.This test was performed on different numbers of cluster cores and different audio signals.This figure shows the superiority of the proposed parallel system implemented on the cluster compared to the sequential system implemented on a single Raspberry Pi of the cluster.The efficiency of the proposed approach increases with an increasing number of cores used in the cluster.
In order to measure the effectiveness of the proposed approach, we use the Execution Time Improvement Ratio (ETIR) 45 , which represents the comparison ratio between the execution time of the sequential watermarking system and the execution time of the parallel watermarking system implemented on the Raspberry Pi cluster.ETIR is defined as follows: The obtained ETIR values of the proposed parallel system on different cores of the Raspberry Pi cluster are presented in Table 19.This table shows that the proposed parallel system is largely fast compared to the sequential system, with time improvements of about 70%, 80%, and 90% using 4, 8, and 16 cores of the cluster, respectively, which proves the effectiveness of the proposed method in terms of speed.
The comparison results presented in Table 20 clearly demonstrate the substantial performance advantage of our proposed parallel system when it is implemented on our multi-core Raspberry Pi cluster in comparison to its sequential counterparts executed on different computing platforms, including the AMD Ryzen 5 PC and the Intel Core i3 PC.For instance, for a 60-s audio signal with a 5438-bit watermark, our proposed system exhibits significantly reduced processing times.Specifically, the proposed system achieves execution times of just 7.6210 s, 4.0680 s, and 2.3197 s when deployed on 4, 8, and 16 cores of the Raspberry Pi cluster, respectively.In contrast, the same computation necessitates 49.2675 s on the AMD Ryzen 5 PC and 19.6285 s on the Intel Core i3 PC.These results unequivocally underscore the marked performance advantage of our parallel approach when implemented on the multi-cores of the Raspberry Pi cluster.Furthermore, this performance improvement
coefficients A H a (low frequencies) and the detail coefficients D H a , D H−1 a , ..., D 1 a (high frequencies).Similarly, the approximation coefficients A H b and the detail coefficients D H b , D H−1 b , ...., D 1 b are generated by the low-pass and high-pass filters of Tree b {g 0 , g 1 }.

Figure 5 .
Figure 5. (a) Architecture of our Raspberry Pi cluster; (b) close-up of our Raspberry Pi cluster.

Figure 6 .
Figure 6.Basic functionality testing of our cluster.

Figure 7 .
Figure 7.The flowchart of the proposed parallel audio watermarking scheme implemented on RPi cluster.

Figure 10 .
Figure 10.The sequential execution time in the proposed watermarking system for the classical audio signal of duration of 60 s: (a) embedding process, (b) extraction process.

Figure 11 .
Figure 11.Execution time (in seconds) of the parallel audio watermarking system implemented on our Raspberry Pi cluster: (a) Embedding process, (b) Extraction process.

Table 2 .
Information on the test audio signals.

Table 3 .
Payload capacity for different audio signals.

Table 4 .
Comparison with six audio systems cited in the literature in terms of payload capacity.

Table 5 .
SNR for different watermarked audio signals.

Table 6 .
Comparison with six audio systems cited in the literature in terms of imperceptibility.

Table 7 .
Robustness results without applying signal processing manipulations.

Table 8 .
Robustness results (BER) in the case of noise addition.

Table 13 .
Robustness results (BER) in the case of existing amplitude scaling.

Table 14 .
Robustness results (BER) in the case of existing cropping.

Table 15 .
Robustness results (BER) in the case of existing shifting.

Table 17 .
Comparison with six audio systems cited in the literature in terms of robustness.

Table 18 .
Sequential execution times of the proposed audio watermarking implemented on a single Raspberry Pi 4B cluster.

Table 19 .
Temporal improvement (ETIR) using the proposed parallel audio watermarking system implemented on our raspberry pi cluster for various audio signals.

Table 20 .
Average execution times for 60 s audio signal with 5438-bit watermark in proposed parallel watermarking and existing systems.